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ABSTRACT 

We present a method to simultaneously model the dust far-infrared spectral 
energy distribution (SED) and the total infrared — carbon monoxide (CO) integrated 
intensity (5 'ir—J co) relationship. The modelling employs a hierarchical Bayesian 
(HB) technique to estimate the dust surface density, temperature (Teff), and spectral 
index at each pixel from the observed far-infrared (FIR) maps. Additionally, given the 
corresponding CO map, the method simultaneously estimates the slope and intercept 
between the FIR and CO intensities, which are global properties of the observed 
source. The model accounts for correlated and uncorrelated uncertainties, such as 
those present in Herschel observations. Using synthetic datasets, we demonstrate the 
accuracy of the HB method, and contrast the results with common non-hierarchical 
fitting methods. As an initial application, we model the dust and gas on 100 pc scales 
in the Magellanic Clouds from Herschel FIR and NANTEN CO observations. The 
slopes of the log^iR—log/co relationship are similar in both galaxies, falling in the 
range 1.1 — 1.7. However, in the SMC the intercept is nearly 3 times higher, which 
can be explained by its lower metallicity than the LMC, resulting in a larger 5 'ir 
per unit Ico- The HB modelling evidences an increase in Teu in regions with the 
highest Ico in the LMC. This may be due to enhanced dust heating in the densest 
molecular regions from young stars. Such simultaneous dust and gas modelling may 
reveal variations in the properties of the ISM and its association with other galactic 
characteristics, such as star formation rates and/or metallicities. 
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1 INTRODUCTION 


Understanding the formation of planets, stars, and the dy¬ 
namics of the host galaxies, including galaxy clusters, in¬ 
variably requires a thorough assessment of the physical con¬ 
ditions of the interstellar medium (ISM). Targeted surveys 
employing ground and space based telescopes have provided 
a wealth of multi-wavelength data, enabling the concurrent 
study of various facets of the ISM. For instance , infrared (IR) 
and sub-millimeter observations from Spitzer (I Werner et al.l 


|2004| . Herschel (iPilbratt et al.l2010ll . CARM^ NANTEl^ 
(and other ground based observatories), have revealed the 
properties of dust and gas prevalent in the ISM, such as 
the temperature, chemical composition, and density. Flex¬ 
ible statistical methods and well tested theoretical models 
are necessary to accurately estimate such properties, as well 
as identify unanticipated features in the large and diverse 
observational datasets. 


^ https://www.mmarray.org 
^ https://www.astro.um-koeln.de/nanten2 
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The spectral energy distribution (SED) of dust can re¬ 
veal its physical characteristics. Dust grains absorb stellar 
radiation, and releases this heat in the form of far infrared 
(FIR) emission. Observed FIR int ensities appear to follow 
a power-law modified blackbody (iHildebrandl 1 1983 1 . Dust 
properties such as the temperature and emissivity control 
the shape of the emergent FIR SED. Therefore, accurately 
constraining the SED parameters from IR observations pro¬ 
vides information about the physical characteristics of dust. 
However, modelling the SED is not trivial, as there are 
significant degeneracies between the parameters. Notably, 
when fitting SEDs without careful consideration of noise 
and the underlying correlation between physical properties, 
the degeneracy between the dust temperature and spectral 
index leads to an artifi cial anti-correlation between the esti¬ 
mated parameters (e.g. Blain et al.ll2003l:ISchnee et al ]|2007l: 


IShettv et al.ll2009bl : ljuvela fc Ysardll2012li . Hierarchical sta¬ 

tistical methods can rigorously account for degeneracies and 
measurement uncertaintie s, thereby providi ng accurate SED 
parameter estimates (e.g. iKellv et al.ll2012ll . 

On large spatial scales ( 50 — 100 pc), FIR emis¬ 

sion traces warm dust heated by the stars. Emission from 
colder dust (with temperatures 20 K) will pale in com¬ 
parison, as the FIR intensity rises strongly with tempera¬ 
ture. All inferred dust parameters, such as the temperatures 
and column densities, must be consistent with other con¬ 
straints, including the relationship between the dust and gas, 
and/or the nature of stellar radiation. Hierarchical models 
are well-suited for such analyses, as they naturally allow for 
the simultaneous parameter estimates of diverse ISM com¬ 
ponents from multi-wavelength datasets. Such simultaneous 
modelling could potentially further reveal the association, 
or lack thereof, of the dust and the stellar component. 

The molecular ISM is considered to be the direct pre¬ 
cursor to the formation of stars, since most young stars 
are pre dicted and observed to be embedded in molecular 
gas (see [Mac Low fc Kles senll2004l : iMcKee fc Ostrik^l 19771 : 
iFukui fc Kawamural 201Cll . and references therein). The low¬ 
est rotational transitions of carbon monoxide (CO) are fre¬ 
quently utilized as tracers of molecular gas. In particular, 
the J — 1 — 0 transition is easily excited at typical densities 
(n ~ 100 cm“®) and temperatures (10 — 100 K) of molecular 
clouds, and emits at frequencies (« 115 GHz) easily detected 
with ground based sub-mm telescopes. It is therefore one of 
the most widely employed tracers of the star-forming ISM. 

Given the formation of stars in the molecular ISM 
traced by CO, and dust heating from young stars, there 
is an expectatio n for som e corre lation between the CO and 
FIR intensities. ISchmidj lll959ll predicted that the rate of 
star formation (Esfr) should be g overned by the amount 
of gas through a power-law scaling. iKennicu m (Il989l.ll998ll 
indeed found a tight relationship between Esfr and the 
molecular gas surface density (Emoi) when integrating over 
whole galaxies. More recently, resolved observations have 
also revealed an i ncreasing trend o f Esfr with Emoi (e.g. 
iBigiel et al.ll2008l : iLerov et al.ll2012h . However, the indices 
of the power-law relationship appear to vary betwee n nor¬ 
mal galaxies fe.g. lBigiel et al.ir2008l : IShettv et al.ll2013h . with 
most galaxies appearing t o favor a sub-linear relationship 
(IShettv et al.l 12013 . l2014bll . Many of these studies rely on 
either on monochromatic IR— Esfr tracers, such as 24 pm, 
or a second tracer to account for the un-absorbed stellar 


radiation, such as UV or Ha. In normal and starbursting 
galaxies, dust absorbs nearly all UV radiation, and so its 
total IR emission is employed as a proxy for the star for¬ 
mation rate, though th ere can be signif i cant uncertaintie s 
in such conversions le.g. lDale et al.|[2005l : IPooe et al.ll2006l ~l. 
Dust radiative heating from youn g stars is expected t o de¬ 
crease in low metallicity systems ( Calzetti et al.ll2007h . re¬ 
quiring alternative conversion factors c ompared to normal 
or st arbursting galaxies (see review bv iKennicutt fc Evans! 
I 2 OI 2 II . Indeed, questions remain on the relationship between 
the CO brightness and any star formation tracer such as 
the dust luminosity, including the possible effects of other 
galaxy properties, such as Hubble type, metallicity, and/or 
stellar mass. 

Accurately estimating any relationship between the 
emission from gas and dust requires sound statistical meth¬ 
ods. For example, linear regression is commonly employed 
for estimating the slope of the gas — SFR relationship. Some 
linear regression techniques are known to produce inaccu¬ 
rate parameter estimates. For instance, when measurement 
uncertainties in the predict or is ignored, the best-fit slope 
will be biased towards zero dAkritas fc Bershadvlll99^ . Ad¬ 
ditionally, when the dataset consists of repeated measures 
from a number of individuals, fitting a single line to the 
pooled data conceals variations in the parameters between 
individual members within the population. Hierarchical sta¬ 
tistical methods can naturally account for both of these 
issues, and have been shown to provide accurate param¬ 
eter estimates for both linear and no n-linear models in¬ 
cluding measurement uncertain ties (e.g. ICarroll et aLllioOhl : 
ICelman et al.lliool : iKellvIlioOTh . 

In this work, we develop a hierarchical Bayesian method 
to assess the relationship between the CO and total FIR in¬ 
tensities. The method simultaneously estimates the param¬ 
eters of the dust SED at each position, as well as the global 
underlying CO — total FIR relationship. Such a hierarchi¬ 
cal method estimates the spatial variation in dust proper¬ 
ties, while self-consistently measuring the large scale rela¬ 
tionship between dust and gas. Consequently, the resulting 
parameter estimates and their distributions probe for any in¬ 
consistencies between the observational data and the model 
SED and CO — FIR relationship. Furthermore, measure¬ 
ment uncertainties are naturally propagated throughout the 
analysis, leading to final parameter estimates that robustly 
accounts for observational noise. 

As a first application, we apply the method to Herschel 
FIR and NANTEN CO observations of the Large and Small 
Magellanic Clouds (LMC and SMC). Due to their proxim¬ 
ity (50 - 60 kpc), these galaxies allow for detailed studies 
of their ISM. The Magellanic Clouds have metallicities that 
are lower than t he Milky Way, 0.2 Zq fo r the SMC and 0.5 
Z 0 in the LMC (iRussell fc Dopit^ll992l L The factor of ~2 
variation in metallicities between the Magellanic Clouds may 
cause detectable differences in their d ust and gas properties . 
From the HERITAGE survey data dMeixner et al.f [20131 1. 
iRoman-Duval et al.l (l2014h found significant differences in 
the dust-to-gas ratios between the two galaxies. Given their 
large angular size, the Magellanic Glouds present the oppor¬ 
tunity to test the effect of averaging over large regions, and 
compare any derived trends with the small scale properties 
of the ISM. Here, as an initial application of the hierarchi¬ 
cal Bayesian method, we model the SED of the Magellanic 
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Clouds on large 100 pc scales with single modified black- 
bodies, in conjunction with CO maps to quantify the CO — 
FIR relationship. 

This paper is organized as follows. In the next sec¬ 
tion, we present the equations governing the assumed CO — 
FIR relationship, and the model dust SED. We also provide 
a description of hierarchical Bayesian methods before dis¬ 
playing the full hierarchical model. In Section [3] we demon¬ 
strate the accuracy of the model on two synthetic datasets. 
For comparison, we also present results using common non- 
hierarchical fitting methods. In the subsequent section, we 
apply the hierarchical Bayesian method to observational 
data of the Magellanic Clouds. We interpret and discuss the 
results in the context of previous results in Section [5l and 
summarize the method and our hndings in Section 


2 MODELLING METHOD 

We explore the relationship between gas and dust through¬ 
out a galaxy by characterizing the relationship between CO 
and FIR emission across the projected face of the galaxy, 
treating the lines of sight corresponding to pixels as probing 
regions that independently sample a global, stochastic CO¬ 
FIR relationship. A more sophisticated treatment would ad¬ 
ditionally model spatial correlation and dependence. Our 
focus here is to develop and implement a framework that 
can correctly account for pixel-specific variability (from mea¬ 
surement error). Our hierarchical framework can be gener¬ 
alized to account for spatial dependence, but we leave that 
for future work. 


2.1 Underlying relationships 


Thermal emission from dust grains is usually modelled with 
a power-law modified Plank spectrum. The observed surface 
brightness Sv at a given frequency v is: 

S. = EdB.(Tefi)Ko(:z/i/o)'’'*“ (1) 

where Ed is the dust surface densit)0, Teff is the dust tem¬ 
perature, and Kv = K,o{v /is the frequency dependent 
dust opacity, which depends on the spectral index The 
SED of a pure blackbody follows the Planck function: 


B^{Tes) 


2hiy^/c^ 

exp{hi' / ksTes) — 1 


( 2 ) 


where h, c, and ks are the Planck constant, speed of light, 
and Boltzmann constant, respectively. 

Equation © is a simplified model of any emergent SED, 
as it employs a number of approximations about the dust 
along the line of sight (LoS). As dust absorbs the radia¬ 
tion from young stars, the dust temperature depends on 
the distances to the these stars. Consequently, a single TeS 
does not accurately model the emergent SED. The estimated 
temperature may better reflect a luminosity weighted tem¬ 
perature, and is a n upper limit to the coldest temperature 
along the LoS (e.g. IShettv et al.ll2009bl : iMalinen et al.|[201ll : 

ljuvela fc Ysardll2012l '). Similarly, dust grains along the LoS 


® In cgs units, Ej as written in equation Q corresponds to g 
cm“^. In this work we convert Sj to Mq pc“^. 


can vary in composition or size and may have a range of spec¬ 
tral indices. Accordingly, we will only consider the estimated 
temperature and spectral index to be adequate approxima¬ 
tions for describing t he shape of the emer gent SED. Follow¬ 
ing the convention of lCordon et al.l ll2014ll . we will therefore 
refer to these quantities as the “effective” temperature or 
spectral index (hence the subscript on Tefr and Pea)- 

The total FIR intensity 5 ir can be computed by inte¬ 
grating the SED over all frequencies: 

Sir = J Svdv = Ed y Btj{Tea)K,vdv ( 3 ) 

We model the relationship between the CO intensity Ico 
and Sir through a power-law, which translates to a linear 
trend in log-space: 

log(SiR/Sfld) =A + n log(7co//fid) (4) 

where Siid = lMJyHzsr“^ and laa ~ lKkms“^ are fidu¬ 
cial values to make the arguments of the logarithms dimen- 
sionlessQ Note that when CO is assumed to be a linear 
tracer of molecular gas, Ico will be proportional to Emoi. 
A common assumption for normal star-forming galaxies is 
that dust is mostly heated by newly born stars, so that dust 
thermal emission indirectly traces the amount of star forma- 
tion|3 Since the amount of dust heating depends on metal- 
licity, among other ISM properties, the FIR — SFR relation¬ 
ship may vary with environment. We choose not to employ 
any conversion factor, and only focus on estimating the total 
FIR — CO relationship. Note that if there were any constant 
FIR — SFR scaling, the slope in equation is the exponent 
in the Kennicutt-Schmidt (KS) relationship: 

Esfr = aE;"oi. (5) 


For given values of A and n, the observed CO map and 
equation set Sir at each location. Combined with the 
dust SED model in equations © - ®, the dust surface 
density is constrained: 


log Ed 


log Ed = log Sir - log 
= A -I- n log Ico — log I 


J B„{Tsf!)K^( 

J B^{Te«)K,^dv^ . 


(6) 

( 7 ) 


Equation 0 relates the dust surface density Ed with 
the true CO intensity Ico, and the integrated SED 
/ B,,{Te«)K,vdv. 


2.2 Measurement Uncertainties 

Let T>co,i denote the data used to estimate Ico,i, the CO 
intensity for location i. A data processing pipeline produces 
a measured intensity for the location, /co,i(L’co.i)i and an 
uncertainty for the intensity, o'co,obs(I’co.i)- We interpret 
these as summaries of a likelihood function for the CO in¬ 
tensity at location i that is log-normal, i.e., Gaussian in 


In subsequent equations, we will omit these fiducial quantities 
in the logarithms for brevity and to follow convention. 

® As we further discuss in Section|5] other properties besides SFR 
influence the dust surface density and temperature that affect the 
emergent SED. 
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log(7co) (at least to a good approximation) 0 We denote 
this likelihood function by 


7i(/co,i) = p{'C>co,i\Ico,i) 


oc exp 


1 


2cr 


2 

CO.obs 



( 8 ) 


Similarly, let denote the data used to estimate 

S'lR.ij, the IR intensity for location i in frequency chan¬ 
nel j. We denote the associated measured intensity by 
>5iR,ij(2?iR,ij), and the intensity uncertainty by crj(I>iR,ij), 
which we consider to be independent between pixels. Be¬ 
sides the usual random uncertainties (independent across 
pixels and channels), Herschel intensity estimates also have 
systematic uncertainty due to absolute calibration uncer¬ 
tainty (see § 4.1). To account for this, we include channel- 
dependent calibration parameters corresponding to an un¬ 
certain multiplicative factor in intensity, and thus additive 
in logarithm of intensity; we denote the additive parameter 
by Cj. The likelihood function for and Cj is 


^ij (5'lR,ij , Cj ) — I SlJi^ij , Cj ) 


OC exp 




Cj SlJi,ij 


2-1 


(9) 


We take Cj to be the same constant across all channels 
within a particular instrument (i.e., PACS and SPIRE), so 
that Cl = C2 = CpACSi and C3 = C4 = C^ = Cspirb. 

We note that in our hierarchical modelling, we approx¬ 
imate all uncertainties with log-normal distributions. This 
choice is motivated by the convenience of transforming all 
intensities into log space. Since the normally distributed un¬ 
certainties are all of order 10% or less, a log-normal approxi¬ 
mation is adequate. We have verified with a few simple tests 
that modelling normally distributed errors in the CO and IR 
intensities as log-normals in the HB model accurate recovers 
the underlying latent parameters in the posterior. 


2.3 The Hierarchical Model 


We adopt a Bayesian approach, addressing parameter es¬ 
timation questions by computing the posterior probability 
density function (PDF) for parameters 0 given the observed 
data T>, denoted 'P(Q\'D). Bayes’s theorem expresses the pos¬ 
terior PDF in terms of more accessible PDFs: 


■proi-ni = y(e)p(p|e) ^ v{v,e) 

V{V) v{v) ■ 


( 10 ) 


That is, the posterior is proportional to the product of a 
prior PDF, 7^(0), and a likelihood function 'P(X>|©), which 
is the probability of observing the data T) given 0 (con¬ 
sidered as a function of 0). Equivalently, the posterior is 
proportional to the joint distribution for the data and pa¬ 
rameters, 7^(1?, 0). The term in the denominator, 'P{T>), is 
constant with respect to the parameters, playing the role of 


° Formally, this is most likely a marginal or profile likelihood 
function, in that modeling the data will require estimating pa¬ 
rameters in addition to the CO intensity, such as background 
parameters. Uncertainty in these parameters may be propagated 
into the CO intensity estimate by marginalization (integration) 
or profiling (optimization). 



Figure 1. Example Directed Acyclic Graphs (DAG) showing 
conditional dependencies between parameters (white circles) and 
measured data (gray circles), (a) DAG indicating that the mea¬ 
sured GO data 7?co,i can be predicted given the true GO intensity 
^CO,i' (h) DAG where Fefr,i, PeH ii S’Ud Sir i determine the IR 
data I>iR,i,j at each frequency j. (c) DAG where the data are sim¬ 
ply the IR and CO intensities, which are conditionally dependent 
on the regression parameters A and n. See Section 12.31 


a normalization constant. It is the prior predictive distribu¬ 
tion for the data, also called the marginal likelihood. 

Since the posterior provides a probability density for a 
set of parameters (conditional on the data), in the Bayesian 
framework the estimated parameters are considered to be 
random variables themselves—but “random” in the sense 
of uncertain, rather than in the frequentist sense of varying 
upon repetition of the experiment. 

We will build up to our full hierarchical model for the 
CO and IR data across a galaxy image by considering three 
simpler inference problems that will appear as components 
of our full model. 

First, suppose we are solely interested in the CO in¬ 
tensity for a single location, so the parameter space is 
O = Ico,i- We might adopt a flat prior for Ico,i, in which 
case the posterior PDF for Ico would be proportional to the 
CO pixel likelihood factor of equation ([8]) (for the pixel of in¬ 
terest). Figure [T])a) depicts the conditional structure of this 
elementary application of Bayes’s theorem with a directed 
acyclic graph (DAG), with nodes corresponding to random 
(a priori uncertain) quantities (parameters or data), and 
directed edges (arrows) indicating conditional dependence. 
The single edge here simply indicates that the modelling 
information lets us predict the data when the parameter, 
Ico, is known. The data node is shaded gray to denote that 
the data become known at the time of analysis. As a whole, 
the DAG describes a factorization of the joint distribution, 
p{Ico,Hco) = p{Ico)p{Hco\Ico), i.e., the numerator in 
Bayes’s theorem. 

Next, suppose we are solely interested in the dust spec¬ 
trum parameters, 0 = {Sm,i,TeS,i, l3eS,i), for a particular 
location i, given the IR data for all channels. The likelihood 
function for the spectral parameters is the product of IR like¬ 
lihood factors given by equation ®, with the SiB.,ij values 
computed using the spectrum parameters. For simplicity, we 
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ignore the intensity calibration parameters for the moment. 
Then the likelihood function for the spectrum parameters is 


3 


oc exp 


XlR,i(Q) 

2 


( 11 ) 


where XiR,i(0) is the familiar goodness-of-fit measure, 

2 V- [log(5lR,,,/5fld)-log(SlR,i,(e)/5fld)]^ 

xiR,i(.'='j - 2^--■ 

i 1 

( 12 ) 

If a flat prior PDF is adopted for the spectrum parameters, 
the mode (the parameter vector that maximizes the poste¬ 
rior PDF) is the maximum likelihood estimate, which is the 
parameter vector that minimizes XiR,i(0)- 

Figure[T](b) depicts the conditional structure of this use 
of Bayes’s theorem. The top nodes denote independent prior 
PDFs for the three parameters. The three arrows indicate 
that all three parameter values must be specified to pre¬ 
dict the data. The five data nodes (for the five frequency 
channels) are depicted using a plate, a box with a label in¬ 
dicating repetition over a specified index (here the channel 
index, j). Crucially, a plate signifies that each case is con¬ 
ditionally independent of the others; that is, with 0 given, 
the probability for I >2 (say) is independent of the values of 
the data in other channels. 

Finally, suppose that the (true) CO intensity, Ico,i, 
and total IR luminosity, 5iR,i, were measured precisely 
for N pixels (i.e., the data are the precise values, rather 
than uncertain estimates from photometry). In this case, we 
could learn the dust-gas relationship parameters in equa¬ 
tion 0 = {A,n), via regression, i.e., using the N pairs 
(^co.i, 5iR,i), to infer the conditional expectation of S'lR^i 
given Ico.i (and 0). The DAG in Figure [T])c) shows the 
conditional structure of this regression model. 

Commonly, a regression analysis quantifies the scatter 
about the fit line. In the next section, we compare the re¬ 
sults from the hierarchical modelling with a non-hierarchical 
approach which utilizes standard regression methods. If we 
suppose the scatter about the regression line is Gaussian 
with standard deviation s, the likelihood function for esti¬ 
mating 0 = {A, n) in this problem is a product of normal 
distributions for the residuals. 




-^[log(SiR,i/Sfid) (13) 


- A - nlog(/co,i/dfld)]’ 


oc -:^exp 


Xlr(€)) 


(14) 


where Xlr(€>) is Ih® sum of squared residuals (normalized 
by s^) that is minimized in least-squares linear regression 
(LR), 


Xlr(0) — ^ 

i 


[log(S'iR,i/5'fld) — A — nlog(Jco,i/Jfid)]^ 


(15) 

To analyze the CO-FIR data, we are ultimately inter¬ 
ested in regression and estimation of {A,n). However, we 


do not have precise {Ico,i, SmD pahs; we have CO and IR 
data, providing uncertain estimates of these quantities. Per¬ 
haps the simplest way forward is to ignore the Ico,i and 
•S'lR.i uncertainties (or hope they will “average out”), using 
the estimates as if they were precise values in a linear regres¬ 
sion model like that depicted in Figure (iKc). But problems 
of this sort are well-studied in the statistics literature on 
measurement error and errors-in-variables problems, where 
it is known that, instead of averaging out, the uncertainties 
instead accumulate, producing inconsistent parameter esti¬ 
mates (i.e., estimates that converge to incorrect values as 
data accumulate). Multilevel or hierarchical models provide 
a flexible framework for full accounting of such uncertainties, 
avoiding these inconsistencies. 

Here we build a hierarchical model by composing the 
three DAGs described above (but leaving the Ico,i and Sm,i 
nodes open in the regression DAG, since the true values of 
these quantities are uncertain). In our HB model for the CO 
and FIR data, the parameter space is large (and grows in 
size with the data); 0 includes the regression parameters, 
{A,n), the uncertain CO and IR intensities, (/co,i, 5iR,i), 
and spectral parameters for each pixel, (Teff,i,/3efi,i, 5iR,i). 
Figure [2] shows the DAG connecting all of these quantities; 
the DAGs of Figure [T] appear as sub-structures within this 
HB model. Note that for the first two DAGs in Figure [T] 
we were focusing our attention on a single location (pixel); 
in the single-location analyses described above, we specified 
flat prior PDFs for Ico,i and the IR SED parameters (in the 
absence of better-motivated alternatives). In the HB model, 
we jointly analyze data from all locations. This enables us 
to learn about the population distributions for these param¬ 
eters. We do so by parameterizing these distributions, e.g., 
adopting a log-normal distribution for Ico,i and a bivariate 
normal for (Tes^i,/?efr,i) (details are provided below). The 
uncertain parameters specifying these population distribu¬ 
tions become additional nodes in the HB model, as shown in 
Figure in HB terminology, these are hyperparameters. For 
more informa tion on hierarch i cal B a yesian methods, we refe r 
the reader t o iGelm an et al.l (l2004fl . iGelman fc Hilll ll2007ll . 
and iKruschkd (12011 ). 

A DAG only specifies the qualitative structure of the 
HB model; to implement it, we must specify distributions 
(suitably conditioned) for every node. For instance, the 
DAG indicates that we need a distribution for the pair 
(Teft.i, Aff.i) conditional on their mean values and the co- 
variance matrix {ut, P/ 9 , Et,/?), independent of all other pa¬ 
rameters. We model the dust Teft,/ and /3efr,i with a bivariate 
normal PDF. Therefore, that node of the DAG corresponds 
to a factor in the joint PDF: 

'P{TeB, PeB\nT, = AT {T^B , PeB\nT , U P ,p), (16) 

where A/'(x|S) is the bivariate normal PDF for x given means 
and covariance S. We employ simpler, abbreviated standard 
statistical notation, so that equation (I16II is written as 

(Tefi, ^efr)^|irT,/3, Et,/3 ~ A/'(mt,/3, Et./s), (17) 

where (x)^ is the transpose of the vector x, and Pt./s = 
(ftT, Up)^■ Some parameters, such as the correlation between 
TeB,i and PeB.i, pT,p, are modeled with uniform distributions, 
denoted by U(min, max) spanning min and max. 

In order to evaluate the joint likelihood of all the pa¬ 
rameters, we model the distributions of most parameters as 
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Figure 2. DAG showing the conditional dependencies between 
the estimated parameters and data of the hierarchical model de¬ 
scribed in Section [2] The highest level contains the global pop¬ 
ulation, or hyper-, parameters such as the mean dust temper¬ 
ature fhe Ra,i and /9eff,i covariance matrix St,(3i which 

includes their correlation {pT,p)^ or the Kennicutt-Schmidt index 
(n). These parameters are required to estimate the other latent 
parameters occurring at the pixel level, denoted with a subscript 
i, and are located within the cyan box, or plate, in the DAG. The 
subscript j refers to the frequency of the observed IR intensity, 
located within an additional plate. The dust surface density Su.i 
can be evaluated given the other parameters (via egn. I30II . and is 
required to estimate the IR intensity Si^j. 


normals. The quantities A, n, Ico and tJco are the popula¬ 
tion parameters describing the intercept and slope of the 
Sir— /co relationship (Eqn.[4|, and the mean and standard 
deviation of the true CO intensity (in log space), respec¬ 
tively. Obtaining a reliable estimate for these quantities is 
one of the primary goals of the hierarchical fitting process. 
The distributions of the population parameters also require 
(hyper) priors. Again, we choose normal distributions with 
large variances. The choice of the mean values of these hy¬ 
perpriors does not affect the posterior, again due to the large 
number of datapoints that constrain these parameters more 
than the priors. In our tests of the method described in the 
next section, we investigate the influence of the prior distri¬ 
butions when the underlying data are not normal. 

For the CO and IR data nodes, we adopt log-normal 
likelihood functions as described above, in equations ([51) and 
®. We assign a population distribution to the CO intensi¬ 
ties that is log-normal, specified by two hyperparameters 
(population distribution parameters): a mean, log/co, and 
standard deviation, aco- We assign these parameters a nor¬ 
mal and a uniform prior, respectively. Thus, 

log(/cO,i/.Ifld)| log(/co/f^fld), crco ~ ^(log IcO / Rd, <^Co) ^ 

(18) 

log(7co//fld) ~V(0,5), (19) 

aco~W(0,2). (20) 

We treat each source’s temperature and power law index as 
drawn from a bivariate normal population distribution, 

{TeS, Pes)! |MT,,8, St,/3 ~ R'ip-T.p, St./s)- (21) 


The population mean temperature and index have normal 
hyperpriors; 

AIT ~ V(40,20), (22) 

Ai/3~V(0,3). (23) 

We write the temperature-index covariance matrix in terms 
of the (marginal) standard deviations for temperature and 
index, and a correlation parameter, pT,p- 

ot pT,po-TO-p , 2 ^. 

pT.pCFTiyp Op \ 

We assign uniform priors to these three hyperparameters: 


'^'r,p\PT,p,OT,op = 


PT,P^U{- 1 , 1 ), (25) 

aT~W(0,50), (26) 

opr^U{G,3). (27) 


Note that it is customary in non-hierarchical models to as¬ 
sign unknown standard deviations log-uniform priors (i.e., 
priors proportional to the inverse standard deviation). In 
a hierarchical setting, because of the weakened connec¬ 
tion between hyperparameters and the data, such an as¬ 
signment can unduly influence the posterior (even making 
it improper, i.e., unnormalizeable). Flat priors have bet- 
ter behavior while s till remaining only weakly informative 
dCelman et al.|[2004l . 

Finally, we specify informative but broad priors on the 
offset (A) and slope (n) in the log(5')-log(/) relationship, 

A ~ ^ 7 ( 0 , 100 ), (28) 

n~rAf( 0 , 4 ), (29) 

with TN{-, ) denoting a truncated normal, requiring the 
quantity to be greater than or equal to the specified mean 
(here requiring n ^ 0). These priors are motivated by previ¬ 
ous observational studies, which always find and increasing 
SsFR with Smoi- We follow convention and assume constant 
conversions between FIR and Esfr, as well as CO and Smoi. 
However, we do not favor any particular values for the con¬ 
version factors, hence employing a large range for the possi¬ 
ble value of the offset (A) parameter. 

The dust surface density can be computed from the 
modeled parameters described above: 


log Ed,i| A, n, /cO,i, Teff.i, /3eff,i = 

A-|-nlog(/co,i//fid)-log {Tea 

(30) 

With Ed,i, Tefi,i and Pesp, the true IR intensity at each pixel 
i is given by Equation [T] 


5i,j|(TefI,/3eff)i,Fd,i = 


2hv^ 


KO - 

yo 


/3eff,i 


eicp{hvj/ksTesp) — 1' 


(31) 


The reference opacity at 1^0=230 GHz is assumed to be fixed 
at Ko = 0.9 cm^ g“^ (lOssenkopf fc Henninel[l994l i. 

We note that we do not favor any particular pT,p so it is 
modeled with a uniform distribution between —1 (fully anti¬ 
correlated) and 1 (exact correlation). We model the mean 
temperature and spectral index, pT and pp, as normal dis¬ 
tributions, with large variances for their priors, allowing the 
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sampler to explore a wide range of possible values near typ¬ 
ical ISM conditions. Given the large nnmber of datapoints, 
the final estimates for and fijs are not sensitive to these 
choices. 

To carry ont the hierarchical analysis, we use the Stan 
probabilistic programming lang uage via its R language API 
IStan Development Teamll2014l ~)rlrl Stan performs efficient 
sampling of the parameters through a Hamiltonian Monte 
Carlo algorithm. We refer the reader to the mannal and 
website for more information about the details of Stan. 


3 SIMULATION STUDIES 

To test the HB fitting method, we construct synthetic 
datasets and compare the posterior with the adopted un¬ 
derlying parameters. For some parameters, we also investi¬ 
gate the effect of choosing incorrect prior distributions, e.g. 
a truncated normal distribution for Ico when the under¬ 
lying distribution is in fact log-normal. In this section, we 
present and discuss the results from three such investiga¬ 
tions, though we have performed the HB analysis on sev¬ 
eral realizations of the synthetic datasets. Here, we aim to 
demonstrate that the HB model described in Section [5] is 
properly implemented. We also illustrate that the HB anal¬ 
ysis provides more accurate parameter estimates than a sim¬ 
ple regression analysis. For the latter, we construct synthetic 
datasets for which the underlying parameters do not follow 
the distributions assumed in the HB model. 

The synthetic datasets are characterized by five latent 
variables: log Ea.i, log S'ir, log Ico, Teff.i, and /?efF,i- From 
these quantities and chosen observational uncertainties, we 
produce the observed CO and IR intensities, lco,i and Sij. 
The first two columns of Tables [T] and [5] show the summary 
statistics of the most interesting hyperparameters of the syn¬ 
thetic datasets. 

Figure [3] shows the distributions and bivariate relation¬ 
ships for the true values of the 5 individual pixel latent vari¬ 
ables in synthetic dataset A. The panels on the diagonal 
display the histograms of each variable. The panels below 
the diagonal show the scatter-plots between the variables 
identified by its row and column position. Similarly, the 
correlation coefficient between the corresponding variables 
are shown in the panels above the diagonal. For synthetic 
dataset A, we choose distributions of the underlying param¬ 
eters to match those in the HB modelling, as the aim of 
this initial test is verify that the HB model is implemented 
correctly. In Table [T] the second column lists the adopted 
values of the parameters in test dataset A. For each of the 
75 replicates (or “pixels”) of this dataset, we include a 15% 
uncertainty to create the synthetic observed CO intensities, 
and a random 2% uncertainty on the IR intensity. For the 
calibrated errors, we employ 10% and 8% uncertainties for 


^ Stan is publicly available at http://mc-stan.org/index.html 
® In Stan, we recover the correct likelihoods 
when we model the observed CO intensi¬ 
ties as Iog(/co,i/7fld)|log(7c0,i7fld),o'c0.obs ~ 

A7(log(/co,i7fld),o-co,obs) and the FIR with 
log(5i,j/5fld)|log(Si,j/Sfld),frj ~ W(log(Si,j/Sfld) -bCj,<7j). 


CpACS, and Cspirb, respectively, corresponding to the esti¬ 
mated uncertainties from the PACS and SPIRE instruments 
(see Section S^. 

In sampling the posterior, we run three MCMC chains 
and ensure sufficient mixing and convergence by inspect¬ 
ing that the R values of all the population parameters are 
very close to one, and that the effective sample siz43 is large 
llCelman et al.l[2004l : iFleeal et al.ll2008l ). Since we are inter¬ 
ested in 95% density intervals, we ensure that the effective 
sample sizes of the main population parameters of interest, 
Pt,/ 3 , least 3500, with corre¬ 

sponding R Ri 1. We choose random initial values for the 
latent parameters, ensuring a wide range (e.g. Teff.i between 
10 and 50 K). For synthetic dataset A, we run the chains for 
18,000 iterations, and the traceplots show that after ~1000 
draws the chains pass the R « 1 convergence testF^ 

The last two columns of Table [T] shows the posterior 
means and 95% highest posterior density (HPD) intervals. 
Clearly, the posterior means are very near the true un¬ 
derlying values, and the HPD bracket the adopted values. 
We have performed similar tests on additional synthetic 
datasets, each with slightly different underlying parameters. 
Since the HPD of the posterior brackets the true value, we 
can be confident of the HB model implementation in Stan. 

In order to explore the effect of the prior distributions, 
we consider datasets where the adopted distributions do not 
correspond to those assumed in the HB model. For this sec¬ 
ond test, we set Ico to have a truncated normal distribution. 
As a result, log/co is not normally distributed, and thereby 
differs from the model assumption. We choose a low value 
for Ico and a large variance. As we draw values for Ico.i, 
we discard any data with 7co,i < 0. This truncation thereby 
directly affects the S'ir distribution. Eliminating replicates 
with Ico.i < 0 also modifies the other latent variables. In 
addition, we draw Teu,! from a uniform distribution, while 
keeping the normal distribution of /3efr,i. With model mis- 
specification, the distributions of the latent variables devi¬ 
ate from their chosen prior distributions in the HB model. 
Figure U shows the distributions and bivariate relationships 
of the underlying quantities in synthetic dataset B, which 
consists of 100 datapoints. We include correlated calibra¬ 
tion and random uncertainties: 10% and 5% for the PACS 
bands, respectively, and 8% and 7% for the SPIRE bands. 
It is clear from Figure |4] that the synthetic quantities do 
not follow the distributions adopted in the priors of the HB 
model. 

We choose initial parameter values and chain lengths 
similar to those employed for synthetic dataset A. We find 
that 18,000 iterations for each of the three chains display 
convergence for all parameters, including i? «1 and large 
effect sample sizes 4000. Columns 1 - 4 of Table [2] lists 
the adopted values and the HB estimated values and their 
95% HPDs of the key parameters in synthetic dataset B. As 
with test A, the 95% HPDs of each parameter includes the 


® We also inspect latent parameters of a few replicates (pixels), 
such as Teff i and PeB,,- For these parameters, we also find R ^ I, 
with very high effective sample sizes ( ^ 15,000). 

For the synthetic datasets A and B, each chain required ap¬ 
proximately 1 hour on a single (2.5 GHz Intel) processor. 
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Table 1. Adopted and HB Estimated Population Parameters for Synthetic Dataset 


Parameter 

True Value 

Posterior Mean 

95% HPD 

Pt.P 

0.47 

0.38 

[0.18, 0.59] 


25.0 

25.1 

[24.7, 25.6] 

(JJP 

2.1 

2.2 

[1.8, 2.6] 

PPett 

1.98 

1.97 

[1.91, 2.05] 


0.3 

0.32 

[0.26, 0.37] 

logico 

1.57 

1.56 

[1.52, 1.60] 

CCO 

0.17 

0.17 

[0.14, 0.20] 

n 

1.5 

1.49 

[1.37, 1.63] 

A 

-2.00 

-1.95 

[-2.15, -1.75] 


^ This dataset contains 75 repeated measures of CO and IR luminosities, including 2% and 10% noise 
uncertainties, respectively, along with 10% correlated IR uncertainties. The effective sample size P:i4000 and 
R^l. 
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Figure 3. Distributions across pixel and bivariate relationships of quantities in synthetic dataset A. The diagonal panels label the 
variables associated with that row and column, and include their histograms. The upper triangle lists the correlation coefficient between 
the quantities in the same row and column. Panels in the lower triangle show the scatter plot of the corresponding data, with the red 
points indicating the mean values. 


true values of dataset B, and the posterior means are similar 
to the true values. 

We compare the HB results with parameters estimated 
via commonly employed methods. For the latter, we per¬ 
form a simple regression ignoring measurement error, or 
RIME, analysis of synthetic dataset B. For each pixel, we 
fit the modified blackbody to the five intensities by mini¬ 
mizing equation nil) and then perform linear regression be¬ 
tween the estimated logSm and log/co, described by equa¬ 


tion m- We simulate a large number of datasets with the 
same properties as synthetic dataset B, including a corre¬ 
lated term for the calibration uncertainty and a random 
noise term. We then perform a RIME analysis of each real¬ 
ization. We compare the resulting sampling distributions of 
the fit parameters to the estimates from the HB modelF^ 


We note that the sampling distributions of the estimated quan- 
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Figure 4. Relationships between quantities in synthetic dataset B. The panels are arranged in the same manner as Figure [S] 


Table 2. Intrinsic, HB, and RIME (x^) Estimated Population Parameters for Synthetic Dataset B 


Parameter 

True Value 

Posterior Mean 

95% HPD 

RIME Estimates 

Pt.P 

0.0 

-0.03 

[-0.32, 0.26] 

-0.22 ±0.1 

MTeft 

25.3 

25.5 

[24.9, 26.1] 

25.7±0.2 

CTjp 

2.2 

2.3 

[1.8, 2.8] 

2.2±0.3 


1.24 

1.18 

[1.10, 1.25] 

1.21±0.03 

ap 

0.33 

0.31 

[0.25, 0.37] 

0.32±0.03 

logfco 

0.15 

0.09 

[0.02, 0.16] 

- 

o'CO 

0.28 

0.32 

[0.28, 0.37] 

- 

n 

1.20 

1.20 

[1.16, 1.25] 

1.14±0.05 

A 

-2.00 

-1.99 

[-2.04, -1.94] 

-1.99±0.07 


^ This dataset contains 100 repeated measures of CO and IR luminosities. As for synthetic dataset A, the 
effective sample size ^4000 and .R=l. 


Figure [S] shows the RIME estimates, HB posterior 
mean, and the true values of Teff,i and Pea,i- The last column 
of Table [2] shows the best RIME parameter estimates and 
the 2a uncertainty of the underlying parameters from the 
sampling distribution. The RIME results indicate an anti¬ 
correlation between Teff.i and /3eff,i, with pT,/ 3 =—0.23. Since 


tities should not correspond to the 95% HPD of the HB results, 
as the range in posterior means would be a more suitable compar¬ 
ison. Nevertheless, we perform this comparison in order to obtain 
an estimate of the RIME uncertainty, and because similar anal¬ 
yses are commonly employed for ascertaining the uncertainties 
when fitting noisy observations. 


the degeneracy between Tefj^i and /3eff,i is explicitly treated 
in the HB method via the modeling of Tefi,! and with 
a bivariate normal distribution, the HB method is mor e re¬ 
liable in estimating their correlation llKellv et al.ll20l3 '). 

We can quantify the fits by computing the mean squared 
error (MSE) of the RIME estimates and HB samples. The 
MSE of /lefi.i of both methods are similar, 0.03. However, 
the MSE of from the RIME method is 2.86, which is 
about 6% higher than the mean MSE of 2.69 provided by the 
HB analysis. Even though the HB estimate misspecifies the 
population parameters, it is able to provide more accurate 
estimates of Tefi.i than the RIME point estimates. 

From the RIME estimates of Tefr,!, /3efr,i and Ea.i, we 
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Teff,i(K) 


Figure 5. Comparison of estimated and from the HB 

(blue) and point estimate methods (red contours, computed 
via kernel density estimation). The black points show the true 
values in synthetic dataset B. 


can integrate the SED defined by these quantities to obtain 
estimates of S'ir. Figure [6] shows the comparison of the true 
and ht Ico — 5ir relationships. The blue lines show 20 ran¬ 
dom draws from the HB posterior. The red points show the 
RIME estimated Sr plotted along with the observed (i.e. 
noisy) Ico values. This differs from the HB model, which 
explicitly estimates the true values of Ico; these values are 
used in the evaluation of the Ico — Sr relationship. The 
RIME estimate of the slope of Ico - Sr is 1.13, with 2a 
uncertainty 0.05, as indicated in Table [2] This value is flat¬ 
ter than the underlying value n=1.2, even at the 2a level. 
As shown by numerous previous works, when measurement 
uncertainties in the predictor are not treated in the linear 
regression, the fit slope will be biased towards 0 (see e.g . 
lAkritas fc BershadvIllQO^ : ICarroll et al.llioo^ : iKell^ 1^0071 ). 
Note also that a linear regression on observations with rela¬ 
tive IR noise levels 10% will estimate a slope with more 
bias than the results from synthetic dataset 2. The HB esti¬ 
mate of n, on the other hand, explicitly treats measurement 
uncertainties in all observables, thereby leading to signifi¬ 
cantly more accurate parameter estimate than the value. 

We have also checked the relationship between each IR 
band and Ico- Fits involving any individual IR intensities 
usually underestimates the underlying slope, with the mag¬ 
nitude of the bias depending on the noise characteristics, 
as well as the true values of the parameters. In general for 
higher noise levels, we have found that the estimated slopes 
using solely the longer wavelength intensities perform worse 
than those at shorter wavelengths. Such tests suggest that, 
similar to the RIME SED fits, employing any individual in¬ 
tensity will likely lead to an underestimate of the slope of 
the FIR — CO relationship. 

We have also performed a test on a synthetic dataset 
where the Tefr.i — PeS,i distribution is not bivariate normal. 
As we show in the next Section, for the LMC the Teff.i — 
/3eff,i distribution has curvature, indicating that the model is 
mis-specified. To assess whether such mis-specification un¬ 
duly influences the estimates of the main parameters of in¬ 
terest, such as n, we construct a sythetic dataset using the 


posterior mean of TeS,i and /3eff,i from the LMC. We create 
all the other latent parameters as in Synthetic Dataset B, 
and perform the HB model of this synthetic dataset. We find 
that the 95% HPD brackets the true value of the underlying 
latent parameters. A rigorous test of such mis-specification 
(e.g., verifying approximate coverage of 95% regions) would 
require the execution of the HB model on thousands of such 
sythetic datasets. Since this is unfeasible, our limited tests 
considering only 5 synthetic datasets suggests that, broadly 
speaking, the results are robust to mis-specifying the Tefi.i 
and distribution. 


Before concluding this section, we stress that though 
the HB model employs the simple assumption of a single 
(effective) temperature and spectral index, the total FIR 
luminosity estimate is not significantly affected by this con¬ 
straint. To verify this, we have constructed simple models 
with an additional colder dust component, with tempera¬ 
tures in the range 5 — 15 K, and with surface densities 
that are 5 — 20 larger than the warm component, and fit 
the single-component SED to the emergent IR intensities at 
the Herschel wavelengths. Though the fit parameters can be 
discrepant from true temperatures and spectral indices re¬ 
gardless of which component dominates the emergent SED, 
the estimated integrated SED recovers S'ir to a very high de¬ 
gree of accuracy, often within 0.5%. This demonstrates the 
robustness of S'ir, and that the fit temperature and spectral 
index should only be interpreted as “effective” parameters. 
Therefore, which determines the Rayleigh-Jeans tail 

of the fit SED, must be understood only as the appropri¬ 
ate numerical parameter required to reproduce the observed 
intensities within the single-component SED model. For an 
investigation focusing on the properties of dust along each 
LoS, additional IR fluxes in the Rayleigh-Jeans tail of the 
SED would be necessary to constrain the /Jeir.i, and a multi- 
component fit may better recover the observed trends. In 
this study, we are primarily interested in the total IR lumi¬ 
nosity that is not strongly affected by the precise parameter 
estimates of (?efr,i, so we can be confident that the informa¬ 
tion provided by the Herschel images is sufficient for esti¬ 
mating Sir. 


In summary, the tests on synthetic data demonstrate 
that the HB model performs well under certain condi¬ 
tions. For synthetic dataset A the underlying distributions 
matched those assumed in the HB model, resulting in a 
highly accurate and precise posterior. Synthetic dataset B 
includes variables with distributions which are discrepant 
from those assumed in the HB model. Nevertheless, the HB 
method is able to recover the underlying latent parameters 
within the 95% HPD. We note that a RIME fit to the in¬ 
tensities of synthetic dataset B recovers the mean values 
and marginal distributions of Tefi.i, and S'ir. However, 
both pt,/3 and n are significantly underestimated. Such dis¬ 
crepancies advocate for employing a hierarchical modelling 
technique, which relaxes the strong assumption of indepen¬ 
dence between Teff,i and adopted in the RIME fit. 


© 2015 RAS, MNRAS OOP.IHflTl 

















Modelling Dust and Gas 11 



4.2 Uncertainties and Photometric colour 
corrections 

The HB model requires (fixed) values for the standard devi¬ 
ations of the correlated and un correlated uncertain ties dis¬ 
cussed in Section [2.21 . Following iMiiller et al. liimi) , we set 
the following for the PACS (j=l or 2, corresponding to 100 
and 160 pm) uncertainties: 

CpAcs ~Af(0,log(l.l)) (32) 

CTj = lor2 = log(1.02). (33) 

As reported in iGriffin et al.l (l2013l ) and iBendo et al.l (l2013li . 
for SPIRE (j=3, 4, or 5, corresponding to 250, 350, and 500 
pm) we set 

CspiRE ~ A/’(0, log(1.08)) (34) 

Crj=3,4or5 =log(1.015). (35) 


Figure 6. Comparison of estimated and true Iqq ~ Sir rela¬ 
tionships for test dataset B. The blue lines are 20 random draws 
from the HB posterior, and the red points show the estimates 
of Sir at the observed values of /co, with the red line depicting 
the best fit. The black points show the true values in synthetic 
dataset B. 


4 APPLICATION: THE LARGE AND SMALL 

MAGELLANIC CLOUDS 

4.1 Observations 

As an initial application of the HB method on real data, 
we analyze the dust and gas properties of the Magellanic 
Clouds using the Her schel HERITAGE key project FIR data 
dMeixner et al.ll2013ll . in conjunction with CO J = 1 — 0 
NANTEN observations (iMizuno et al.l I2001al f9l. We con¬ 
volve all maps to a common resolution of 100 pc, since we 
are interested in the large scale properties of the ISM. We 
consider those pixels with Ico > 0 K km s“^. Given these 
constraints, the number of independent pixels we analyze in 
the SMG and LMC is 132 and 1584, respectively. To illus¬ 
trate the maps we employ for this work. Figure [7] shows the 
LMC and SMC map at 100 pc scales from the 100, 160, and 
250 pm observations, with the contours displaying the CO 
peaks. 

We have quantified the spatial correlation of residuals 
between neighbouring pixels as a result of the beam convolu¬ 
tion at the 100 pc resolution we employ. We have measured 
the correlation coefficient of the residual values of neigh¬ 
bouring lOOxlOOpc^ pixels in synthetic data with similar 
resolution and gridding to Herschel images. We find that the 
residuals can be correlated by upto ~20 per cent between 
adjacent pixels. We have checked that this correlation likely 
does not influence our results as we have run the HB model 
on every other pixel of the LMC maps, in which case the 
employed pixels are uncorrelated, and have recovered sim¬ 
ilar results to that from the full maps. Therefore, the ~20 
per cent introduced correlation between adajacent pixels is 
sufficiently small such that we can confidently apply the HB 
model, assuming conditional independence between pixels. 


To compare the model SEDs with the observations, the 
analytic modified blackbody (MBB) curves should be con¬ 
volved with the PACS and SPIRE filter response functionj^ 
to produce synthetic Herschel photometry. In order to speed 
up the calculations and because the corrections only depend 
on the shape of the SED, we pre-calculate colour correc¬ 
tions (CC) factors for MBBs with a range of T and /3eff 
values. The correction factors are defined as Fi,,Herschel = 
GGr/'F^^^mono, where Fi, 3 erschei is the flux-density value as 
measured by the Herschel bolometers and Fi/,mono is the 
monochromatic flux-density, i.e. the value in the perfectly 
sampled model SED, at the reference wavelength!^ We cal¬ 
culate CC values for 15 T ^40 K and 0^1 Pes ^3. In 
general the correction factors are relatively small but signif¬ 
icant. Over the immediate range of interest (20K<T<30 K 
and 0.5< /3eff <2) the maximum amplitude (HGC — 1||) of 
the correction is 0.02, 0.05, 0.05, 0.07 and 0.11 for the 100, 
160, 250, 350 and 500 /rm filters, respectively. 

Inside STAN, the colour correction factors are specified 
as a second order polynomial in T and /leff.i whose parame¬ 
ters have been determined by fitting a 2D plane to the CC 
values using the IDL routine SFIT. 


4.3 Results 

Figure [8] shows the marginal means of the estimated Teff.i 
and /3eff.i in each pixel of the SMG and LMC. Tables [3] and 
|T] display the posterior mean and 95% HPDs of the loca¬ 
tion and scale parameters of the Teff.i — PeS,i distributions. 
We find that the distributions of Teii.i and /leff.i are rather 
different between the two galaxies, with the LMC showing 
a larger anti-correlation (mean value pj, ^ = —0.74) than 
the SMC ijirp p = —0.15). In general, the galaxies also have 
different mean temperatures and spectral indices location 
parameters, with ~ 24.3 in the LMC and « 25.7 K in 
the SMC. The ~l-32 for the LMC, and «0.99 for the 
SMC. 

As is evident from Figure (H the Tesp — distribu¬ 
tion is not bivariate normal, especially for the LMC. This 
indicates that this aspect of the HB model is mis-specified. 


http://herschel.esac.esa.int/Docs/PACS/htnil/pacs_om.htnil 
http: //herschel. esac. esa. int/Docs/SPIRE/html/spire_oni.html 

We follow http: //herschel. esac. esa. int/twiki/pub/Public/PacsCalibration 
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Figure 7. Three-color image of the LMC and SMC at 100 pc scale from the 100, 160, and 250 \im Herschel observations. The blue 
contours show the CO intensities at by 0.2, 0.3, 0.5, 1 and 2 K km s~^ for the LMC, and 0.1, 0.2, 0.3, 0.5 and 1 K km s“^ for the SMC. 
The red box demarcates the observed region of the CO maps 
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Table 5. HB Estimated Parameters for the LMC 


Parameter 

Posterior Mean 

95% HPD 

logfco 

-0.59 

[-0.61, -0.57] 

o-CO 

0.32 

[0.28, 0.36] 

n 

1.25 

[1.11, 1.38] 

A 

-2.5 

[-2.6, -2.4] 


Table 6. HB Estimated Parameters for the SMC 

Parameter 

Posterior Mean 

95% HPD 

log/co 

-0.76 

[-0.82, -0.69] 

CCO 

0.21 

[0.17, 0.26] 

n 

1.47 

[1.21, 1.72] 

A 

-2.1 

[-2.3, -2.0] 


Figure 8. The /3efr,i relationships for the LMC (red) and 

SMC (blue). 


Table 3. HB Estimated T^g ; - /Seff.i parameters for the LMC 


Parameter 

Posterior Mean 

95% HPD 

Pt,p 

-0.74 

[-0.76, -0.7] 


24.3 

[24.1, 24.5] 

CTJ' 

4.3 

[4.1, 4.4] 


1.32 

[1.31, 1.33] 


0.28 

[0.27, 0.29] 


Table 4. 

HB Estimated Teff^i - 

/3efr i parameters 

Parameter Posterior Mean 

95% HPD 

PT,P 

-0.15 

[-0.34, 0.03] 

MTeff 

25.7 

[25.4, 26.0] 

CTj' 

1.8 

[1.6, 2.0] 


0.99 

[0.97, 1.00] 

<r/3 

0.1 

[0.09, 0.12] 


and the HB parameter T^g^i — /Seff.i estimates do not have 
a straightforward interpretation as moments of the popu¬ 
lation distribution; rather, they are moments of a bivari¬ 
ate normal approximation to that distribution. Nevertheless, 
from the simulation studies described in Section 3, we can 
be confident that the other estimated parameters are not 
strongly affected by this specification. One possible reason 
for the curved T^g^i — /3efi,i relationship is that the single¬ 
temperature SED does not accurately reproduce the shape 
of the observed spectrum, as we further discuss below. 

The HB estimates of T^g^i and /3efr.i in t he Magellanic 
Cloud s are comparable to previous estimates. lAguirre et al.l 
ll2003l) also found that the mean spectral index in the SMC 
is lower than in the LMC. The absolute values of their es¬ 
timates differ from those we estimate here, in part because 
they analyze all emission from the galaxies, whereas we only 
consider dense regions where CO is detected. 

To illustrate that the HB modelling has accurately re¬ 
produced the observed data, we show two sample SEDs from 
each galaxy in Figures |9] and [TO] The models and data cor¬ 
respond to the regions with some of the highest and lowest 
S'lR from each galaxy. The figures show the observed IR 
intensities, as well as the predicted intensities at the cor- 
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CO 

-^1.0 
^ 0.5 

05 
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Figure 9. Example SED fits of two pixels in the LMC. These 
pixels correspond to regions that have the largest and smallest 
S'lR,. The lines are random draws from the posterior, and the blue 
points with errorbars show the mean Herschel predicted intensi¬ 
ties with 2a uncertainties. The slight offsets between the predic¬ 
tions and the model SEDs are due to the calibration uncertainties 
and color corrections. The white circles show the observed inten¬ 
sities (-S'ij). The legend below each set of SEDs shows the 95% 
HPDs of the fit parameters. 



100 200 300 400 500 600 


A, (jim) 


— Models 



100 200 300 400 500 600 


A (jim) 


Figure 10. SED fits in pixels among the highest and lowest 
Lir in the SMC. The model, predictions, and observed data are 
marked in the same fashion as Figure [9] 


responding wavelengths. Note that the offset between the 
prediction and data from the model SEDs, especially at the 
highest freqnencies, is expected, primarily dne to the color 
corrections described in Section 1121 and to a lesser extent 
the correlated uncertainties. The observations fall within the 
expected range of the predicted intensities, indicating that 
the HB method accurately recovers the observed data. 

The differences in pT^p may be due to a number of 
possible variations in the structure of the ISM between 
the SMC and LMC . For instance, temperature gradi- 


Shettv et al.l 2009bl: 

Malinen et al.l 2011 

: Juvela & YsardI 

2 OI 2 I: Veneziani et alJ 

2 OI 3 I: Gordon et al. 

2014). There may 


be contrasting Teff,i gradients, either in extent or magni¬ 
tude, in the Magellanic Clouds leading to the difference 
in Pt,p- Other possibilities are that the dust-to-gas ratio 
and 500 pm excess may var-f^fe.g. Bernard et ahl l2008l : 

I Roman-Duval et al. ] I2OI0I : ICordon et al.l l2014l L Though 
measurement uncertainties can produce an artificial anti- 
correl ation due to the d e generacy between T e fr.i and 0ea.i, 
fe.g. iBlain et ah ] I2OO3I : IPupac et al.l l2003l : IShettv et al'l 
[20093), til® HB method explicitly models the Teft.i—/?efr,i cor¬ 
relation and a ccurately accounts for noise, thereby reducing 
this effect f see [Kelly et aLll2012l ). Other violations of model 
assumptions may also produce an artificial anti-correlation, 
such as a multiple dust components along the LoS leading 
to SEDs with a broken power-law Rayleigh-Jeans tail, and 
may be responsible for the apparent deviation of the LMC 
points from a bivariate normal distribution in EigurejS] As 
we have employed the same assumptions in modelling both 
Magellanic Clouds, the estimated differences in the — 
/3eft,i indicates the presence of fundamental differences in the 
dust characteristics of the LMC and SMC. 

Tables [S] and display the HB estimated parameters 
related to the Sir — 7co of the LMC and SMC. Figure 
m\ shows 50 random draws of the /co— Sir relationship for 
both galaxieJ^. The slopes of the two galaxies are similar; 
the posterior mean value of n is 1.3 and 1.5 for the LMC 
and SMC, respectively, with overlap in the 95% HPDs. On 
the other hand, the intercepts are clearly different. Per unit 
7co, the SMC has a ~0.4 dex higher Sir than the LMC. 
This difference is likely associated with the lower metallicity 
of the SMC, 0.2 Zq, compared to 0.5 Z© of the LMC. If Sir 
is a faithful tracer of the star formation rate, these results 
suggest that the star formation rate per unit 7co is higher 
in the SMC by the same ~0.4 dex factor. Numerous pre¬ 
vious observations have shown such tr e nds for lower metal¬ 
licity systems (e. g. ^^or_et_^ 1 19981 : iBolatto et al.! [201 ll : 

I Leroy et al.l l2007l : ICormier et alJ 20141 ) . The salient differ¬ 
ence is in the ability to trace the star forming ISM with CO. 
In lower metallicity environments, there is a dearth of CO, 
so a given 7co would be associated with a higher SFR com¬ 
pared to the higher metallicity case | Malongy-^^IacljllSSSl; 
Wolfire et al. 2010l: Clover fc Mac Low! I 2 OIII: IShettv et alJ 


2011al 3; Glover fc Clark! 20121 : Clark fc Gloverj 20151) . Ad- 


ditionally, regions with higher atomic densities, will contain 
more dust. Therefore, dust emission should be correlated 
with total (atomic -I- molecular) gas density, and the offsets 
between the galaxies in Figure[TT]may be partially due to the 
variations in total gas density, in addition to the metallicity 
differences. 

The mean dust temperature is clearly larger in the SMC 
than the LMC, contributing to the overall increase in 5ir 
(at a given Ico)- Such higher tempera tures are observed 
in ot her lower metallicity systems (e.g. iRemv-Ruver et ^ 
I2OI3I) . Figure \n \ shows the posterior mean relationship be¬ 
tween Tgffp and Ico- In the LMC, at large Ico ~ 0.5 K km 
s“^, there is strong evidence that Tesp increases from « 20 


However, we have checked that excluding the 500 pm obser¬ 
vations from the fits in the SMC does not significantly alter the 
HB estimated parameters. 

As the regression lines all fall within the errors, we can be 
confident that no additional scatter is required to model the 
7 co~'S'ir relationship. 
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Figure 11. Ico~ >S’ih, relationship in the LMC and SMC (50 
random draws from the posterior). Colored lines depict draws 
of regression lines from the marginal posterior distribution for 
{A, n) for the SMC (red) and LMC (blue). Points with error bars 
show the marginal posterior means and marginal (1-D) 95% HPD 
regions for (log/coi log Sia) for each pixel in the two images. 


K to >30 K where 7co ~ 3 K km s“^. This increase in dust 
temperature is likely due to radiation from young stars em¬ 
bedded in the dense molecular medium. At lower densities, 
we also find high Tefr^i values. Although this may be due to 
a decline in self-shielding, in the regions with the lowest Jco 
the signal is close to the noise levels, so we should be careful 
not to over-interpret these data. At intermediate Ico, there 
is a large range in Teff,!, which is indicative of a mix of dense 
star forming regions, diffuse gas, and all material at inter¬ 
mediate densities. The SMC does not portray any strong 
trends as the Tefi.i estimates span a smaller range, though 
there is some hint of an increase in Teff.i at the highest Ico- 

One question is to what extent the dust surface density 
estimates affect Sir. Figure fT51 shows the posterior mean of 
the Ed,i — Ico trends in the SMC and LMC. Again, there 
is an unambiguous offset, with the SMC showing larger Ed,i 
per unit Ico- As discussed, the lower metallicity in the SMC 
results in fewer CO molecules for a given dust surface den¬ 
sity. Therefore, it is the combination of both higher temper¬ 
atures and higher surface densities per unit CO intensity, 
both due to the lower metallicity in the SMC, which leads 
to the higher normalisation in the 5ir— Ico trend shown in 
Figure fm 

The HB model provides estimates of all the latent vari¬ 
ables that determine the observed CO and IR intensities. 
Figure fTdl shows the modeled distributions and bivariate cor¬ 
relations of these variables from the LMC. The displayed 
model corresponds to the posterior mean. Though there are 
clearly strong trends between many of the variables besides 
the bivariate relationships discussed above, as we discuss in 
the next section, interpreting such correlations requires a 
complete consideration of the key assumptions that enter 
the model. 


Figure 12. Ico~ Tea,i relationship in the LMC (blue) and SMC 
(red). In order to better reveal the contrast between the Magel¬ 
lanic Clouds, we do not show contours for the LMC data. The 
points show Ico and Sir at each pixel from the mean of the 
posterior. 

5 DISCUSSION 

The HB modeling of dust SEDs and the dust - molecular 
gas relationship on 100 pc scales has revealed some interest¬ 
ing comparative features of the LMC and SMC. There is a 
strong anti-correlation between and j3eS,i in the LMC, 
with the SMC portraying a weaker anti-correlation, which 
may even be uncorrelated. Further, the slopes of Ico— Sir 
relationship are nearly identical, while Sir in the SMC is 
larger by a factor of ~3 at any given Ico- These results 
suggest that there are some fundamental differences in the 
properties of the dust and gas between the galaxies. 

Though the estimated pt ,/3 is significantly different be¬ 
tween the LMC and SMC, we should be careful to draw too 
strong a conclusion from this result. Since the SED model 
does not account for multiple components, the estimated 
/3eff,i does not necessarily correspond to the underlying spec¬ 
tral indices of any of the components along the LoS. Rather, 
it should be considered a convenient numerical parameter 
necessary for fitting the SED model to the observations. As 
S'lR is not sensitive to the estimate of /3efr,i, the modelled 
Ico— •S'lR also does not strongly depend on /3eff,i. The SED 
modelling results indicate that there is some fundamental 
difference in the properties of dust, but further analysis is 
necessary to identify the cause of these differences. Possible 
causes may be associated with differences in the magnitude 
of the 500 pm exc ess, which is not mo d elled here, the dust- 
to-ga s ratio (e.g. lOordon et al.l l2014l : iRoman-Duval et al.l 
l2014h . and/or the dominance of the trend of some subset of 
the data (e.g. at low or high Teiry). We have performed the 
HB fit to the SMC data excluding the 500 pm observation, 
but the resulting correlation and distributions are similar, 
suggesting that any 500 pm excess is not unduly affecting 
the estimated correlations in the high density regions traced 
by CO. More in-depth analysis of the observed IR maps may 
favor one of these or perhaps even other causes for the con¬ 
trast in Pt,i 3 , and whether the metallicity difference between 
the LMC and SMC is a significant contributing factor. 

That the estimated value of n in both galaxies are (ap- 
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Figure 13. Example Ico~ i relationship in the LMC and SMC. 
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Figure 14. Modelled relationships and distributions between the latent variables in the LMC. The panels are arranged in the same 
manner as Figure [S] 


proximately) equivalent may be indicative of a key similar¬ 
ity in the LMC and SMC. If Sm is considered to be a linear 
tracer of Esfr and Ico is a faithful tracer of Emoi, then 
the HB results suggests that the increase in the rate of star 
formation towards denser regions is similar in the LMC and 
SMC despite their difference in metallicity. 

Analyses of the KS relationship on large scales in nor¬ 
mal disk galaxies have revealed a range in n, with many 
gala xies portraying a sub-linear Esfr —E moi relationship 
fe.g. lBlanc et ai.ll2009l : lFord et al.ll2013l L In fact, when using 


the single 24 pm intensity or its combination with UV ob¬ 
servations, many galaxies in the HERACLES and STING 
surveys favor a sub-linear KS relationship, with signif i¬ 
cant galaxy-to-galax y variations dShettv et al.ll20l^ . l20149l . 
IShettv et al.l (l2014al ~) interpret these and other observational 
results as evidence for a substantial amount of non star¬ 
forming molecular gas traced by CO, with this diffuse gas 
fraction consis ting of at least 30% of the total molecular con¬ 
tent fsee also I Wilson et ^ 1 20121 : ICaldu-Primo et ^ l2013l : 
IPetv et al.ll2013l ~l~ Recent analyses of the Milky Way further 
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support s the presence of a significant diffuse molecu lar com¬ 
ponent (iRoman-Duval et al.l201^ : lLiszt et al.l201Clh . Most of 
the extra-galactic results utilized monochromatic SFR trac¬ 
ers, which could contribute to the differences between the 
estimated indices from those investigations and those pre¬ 
sented here. 

Another explicit difference is that the metallicities of 
the Magellanic Clouds are significantly lower than those of 
the normal spiral galaxies. That the Magellanic Clouds have 
a super-linear KS relationship may be due to their lower 
metallicity. In low metallicity systems, higher levels of pho¬ 
todissociation results in a dearth of CO trace d diffuse molec¬ 
ular g as compared to the normal spirals fe.g. larmier et al.l 
I 2 OI 0 I : ISchruba et al.l I 2 OI 2 I : I Glover fc Clarkll2012l i. Most re- 
gions with sufficient amounts of CO coincide with the loca¬ 
tions of star formation. Applying the HB model to extra- 
galactic surveys will further reveal any trends between the 
KS index and other galaxy properties, and will allow for 
a more direct comparison with the results presented here. 
Worthwhile comparisons of systems with diverse metallic¬ 
ities involving dust SEDs would also require consideration 
of the fraction of young stellar radiation that does not heat 
the dust, w hich is estimated to be higher in the Magellanic 
Clouds fsee lLawton et aI1l201Cll l. 

Such investigations on a larger sample of galaxies should 
further reveal how well the dust properties may be associ¬ 
ated with other characteristics of the ISM, such as the gas 
density, metallicity, and/or galaxy type. In these forthcom¬ 
ing analyses, however, we need to ensure that the model as¬ 
sumptions are fully considered in the interpretations. For in¬ 
stance, in this work we model the dust emission with single¬ 
component SEDs. As we are focusing on the dust and gas 
properties on large scales (100 pc), there is most certainly 
temperature variations along the LoS. Indeed, recent inves¬ 
tigations on SED modelling on smaller scales have suggested 
that models with multiple dust components can better re¬ 
produce the observed I R emission (either with varying Teff.j, 
;3eff.i , or both, see e.g. ICalliano et al.l I 2 OIII : I Gordon et al.l 
|20141 . Future efforts including a consideration of tempera¬ 
ture variations may provide additional insights into the large 
scale ISM. 


6 SUMMARY 

We have introduced a hierarchical Bayesian (HB) method 
to analyze FIR and CO observations. The HB method esti¬ 
mates parameters of the dust SED; using estimates of the 
dust surface density Ed,i, a single temperature Teff,i and 
spectral index /9eff,i, the method calculates the integrated 
FIR intensity Sir at each pixel. Furthermore, the method 
simultaneously estimates the linear regression parameters of 
the log Sir — log Ico relationship. When assuming that Sir 
and Ico faithfully trace the star formation rate Esfr and 
molecular gas surface density Emoi, respectively, the slope 
of this relationship is the Kennicutt-Schmidt index. 

We test the HB method on synthetic datasets (Section 
0. Even when the distributions of some key latent param¬ 
eters are not normal, which is assumed in the HB model, 
the range in estimated parameters include the true under¬ 
lying values. We also compare the HB results with common 
non-hierarchical techniques. Due to the degeneracy between 


Teff,i and /3eff,i in the SED, the fif produces a Tefi,i— /3eff,i 
distribution that is biased towards an anti-correlation. The 
HB method explicitly treats the correlation between Teiiy 
and /3efr,i among all pixels, and we demonstrate the posterior 
accurately recovers Pt,i 3 - Moreover, a fif fo the relation¬ 
ship between Ico and 5'ir, or any individual IR intensity, is 
biased towards smaller slopes. This occurs in part because 
the x^ analysis overestimates S'ir at low densities. 

We apply the HB fit to Herschel IR and NANTEN GO 
maps of the LMG and SMC at 100 pc scales. The main 
results of this first application of the HB methods are as 
follows; 

1) We find a stronger negative correlation between Teuy 
and /3eff,i in the LMG, with pT,p~ —0.74, compared to the 
SMC, with Pt,p~ —0.15 . These results reflect fundamental 
difference in the properties of dust and the structure of the 
ISM between the Magellanic Clouds. 

2) The slopes of the FIR — CO relationship for both 
galaxies are similar, falling in the range 1.1 — 1.7. However, 
in the SMC the intercept is nearly 0.4 dex higher. This differ¬ 
ence can be attributed to the lower metallicity of the SMC. 
Due to the paucity of CO, there is larger S'ir per unit Ico 
in the SMC compared to the LMC, where the metallicity is 
about two times larger. The lower metallicity in the SMC 
can also explain the higher overall temperatures and Sir at 
a given Ico- If there are fixed conversion factors (within a 
galaxy) between Ico and Emoi, and FIR and Esfr, then 
these results suggest that the Magellanic Clouds have simi¬ 
lar Kennicutt-Schmidt indices. 

3) In the LMC, the HB modelling reveals an increase in 
Teii,i in regions with the highest CO intensities. At Ico ~ 0.5 
K km s“^, Tefjy increases from ~20 K to >30 K where Ico~ 
3 K km s“^. This is indicative of increased dust heating at 
the densest regions, likely from newly born stars. There is 
also evidence for increasing Teiiy towards lower gas densities 
at Ico ~ 0.1 K km s“^, due to the waning influence of self¬ 
shielding in diffuse regions. 

We discuss the SED parameters and KS relationship 
in Section [5] Further investigation of the FIR intensity is 
necessary to understand the origins of the difference in pT,p 
between the two galaxies. The difference in KS slopes be¬ 
tween the irregular Magellanic Clouds, where n is clearly 
above unity, and normal disk galaxies where n is sub-linear, 
may be due to metallicity or other global galaxy properties. 
Similar hierarchical modelling of other galaxies will allow for 
a more direct comparison between the dust and gas proper¬ 
ties of the ISM under diverse galactic environments. 
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